Parallel Prefix Scan with Compute Unified Device Architecture (cuda)

نویسنده

  • B. MUNI LAVANYA
چکیده

Parallel prefix scan, also known as parallel prefix sum, is a building block for many parallel algorithms including polynomial evaluation, sorting and building data structures. This paper introduces prefix scan and also describes a step-bystep procedure to implement prefix scan efficiently with Compute Unified Device Architecture (CUDA). This paper starts with a basic naive algorithm and proceeds through more advanced techniques to obtain best performance. KeywordsScan, Parallel prefix sum, Prefix scan, CUDA, Parallel algorithms, Naïve algorithm

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Compact Genetic Algorithm on CUDA-C Platform

This paper deals about the parallel implementation of the compact Genetic Algorithm on the Compute Unified Device Architecture (CUDA) platform of GPU. We elaborate implementation details on the parallel platform.

متن کامل

Parallel Optimized Algorithm for Apriori Association Rule Mining on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)

Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently .Now GPU(Graphics Processor Unit) has taken a major role in high performance computing for general purpose applications. Compute Unified Device Architecture (CUDA) programm...

متن کامل

Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems

Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...

متن کامل

Image Based Virtual Dimension Compute Unified Device Architecture of Parallel Processing Technology

There are a number of virtual dimension typical targets in hyperspectral image. Determining the virtual dimension is the first step in many applications of hyperspectral image. In view of the virtual dimension calculation method of having high time complexity problem, according to the calculation of highly parallel features, in this paper graphics processing unit (GPU) using the Compute Unified...

متن کامل

Parallel design of JPEG-LS encoder on graphics processing units

With recent technical advances in graphic processing units (GPUs), GPUs have outperformed CPUs in terms of compute capability and memory bandwidth. Many successful GPU applications to high performance computing have been reported. JPEG-LS is an ISO/IEC standard for lossless image compression which utilizes adaptive context modeling and run-length coding to improve compression ratio. However, ad...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014